基于模型的强化学习的关键承诺之一是使用世界内部模型拓展到新颖的环境和任务中的预测。然而,模型的代理商的泛化能力尚不清楚,因为现有的工作在基准测试概括时专注于无模型剂。在这里,我们明确测量模型的代理的泛化能力与其无模型对应物相比。我们专注于Muzero(Schrittwieser等,2020),强大的基于模型的代理商的分析,并评估其在过程和任务泛化方面的性能。我们确定了一个程序概括规划,自我监督代表学习和程序数据分集的三个因素 - 并表明通过组合这些技术,我们实现了普通的最先进的概括性和数据效率(Cobbe等人。,2019)。但是,我们发现这些因素并不总是为Meta-World中的任务泛化基准提供相同的益处(Yu等人,2019),表明转移仍然是一个挑战,可能需要不同的方法而不是程序泛化。总的来说,我们建议建立一个推广的代理需要超越单任务,无模型范例,并朝着在丰富,程序,多任务环境中培训的基于自我监督的模型的代理。
translated by 谷歌翻译
Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically fails to evaluate robustness to worst-case outcomes. Prior research in computer poker has examined how to assess such worst-case performance, both exactly and approximately. Unfortunately, exact computation is infeasible with larger domains, and existing approximations rely on poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, thereby approximating worst-case performance. We demonstrate the technique in several two-player zero-sum games against a variety of agents, including several AlphaZero-based agents.
translated by 谷歌翻译
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games -the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled -our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
translated by 谷歌翻译
Periocular refers to the region of the face that surrounds the eye socket. This is a feature-rich area that can be used by itself to determine the identity of an individual. It is especially useful when the iris or the face cannot be reliably acquired. This can be the case of unconstrained or uncooperative scenarios, where the face may appear partially occluded, or the subject-to-camera distance may be high. However, it has received revived attention during the pandemic due to masked faces, leaving the ocular region as the only visible facial area, even in controlled scenarios. This paper discusses the state-of-the-art of periocular biometrics, giving an overall framework of its most significant research aspects.
translated by 谷歌翻译
Feedforward fully convolutional neural networks currently dominate in semantic segmentation of 3D point clouds. Despite their great success, they suffer from the loss of local information at low-level layers, posing significant challenges to accurate scene segmentation and precise object boundary delineation. Prior works either address this issue by post-processing or jointly learn object boundaries to implicitly improve feature encoding of the networks. These approaches often require additional modules which are difficult to integrate into the original architecture. To improve the segmentation near object boundaries, we propose a boundary-aware feature propagation mechanism. This mechanism is achieved by exploiting a multi-task learning framework that aims to explicitly guide the boundaries to their original locations. With one shared encoder, our network outputs (i) boundary localization, (ii) prediction of directions pointing to the object's interior, and (iii) semantic segmentation, in three parallel streams. The predicted boundaries and directions are fused to propagate the learned features to refine the segmentation. We conduct extensive experiments on the S3DIS and SensatUrban datasets against various baseline methods, demonstrating that our proposed approach yields consistent improvements by reducing boundary errors. Our code is available at https://github.com/shenglandu/PushBoundary.
translated by 谷歌翻译
Cartesian impedance control is a type of motion control strategy for robots that improves safety in partially unknown environments by achieving a compliant behavior of the robot with respect to its external forces. This compliant robot behavior has the added benefit of allowing physical human guidance of the robot. In this paper, we propose a C++ implementation of compliance control valid for any torque-commanded robotic manipulator. The proposed controller implements Cartesian impedance control to track a desired end-effector pose. Additionally, joint impedance is projected in the nullspace of the Cartesian robot motion to track a desired robot joint configuration without perturbing the Cartesian motion of the robot. The proposed implementation also allows the robot to apply desired forces and torques to its environment. Several safety features such as filtering, rate limiting, and saturation are included in the proposed implementation. The core functionalities are in a re-usable base library and a Robot Operating System (ROS) ros_control integration is provided on top of that. The implementation was tested with the KUKA LBR iiwa robot and the Franka Emika Robot (Panda) both in simulation and with the physical robots.
translated by 谷歌翻译
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
translated by 谷歌翻译
Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.
translated by 谷歌翻译
Pre-training is an effective technique for ensuring robust performance on a variety of machine learning tasks. It typically depends on large-scale crawled corpora that can result in toxic or biased models. Such data can also be problematic with respect to copyright, attribution, and privacy. Pre-training with synthetic tasks and data is a promising way of alleviating such concerns since no real-world information is ingested by the model. Our goal in this paper is to understand what makes for a good pre-trained model when using synthetic resources. We answer this question in the context of neural machine translation by considering two novel approaches to translation model pre-training. Our first approach studies the effect of pre-training on obfuscated data derived from a parallel corpus by mapping words to a vocabulary of 'nonsense' tokens. Our second approach explores the effect of pre-training on procedurally generated synthetic parallel data that does not depend on any real human language corpus. Our empirical evaluation on multiple language pairs shows that, to a surprising degree, the benefits of pre-training can be realized even with obfuscated or purely synthetic parallel data. In our analysis, we consider the extent to which obfuscated and synthetic pre-training techniques can be used to mitigate the issue of hallucinated model toxicity.
translated by 谷歌翻译
Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.
translated by 谷歌翻译